12 research outputs found

    A Pāniniān Framework for Analyzing Case Marker Errors in English-Urdu Machine Translation

    Get PDF
    AbstractPanini's Kāraka Theory is solely based on the syntactico-semantic approach to understanding a natural language which takes into consideration the arguments of the verbs. It provides a framework for exhibiting the syntactic relations among constituents in terms of modifier-modified and semantic relations with respect to Kāraka-Vibhakt̪i (semantic role and postposition).In this paper, it has been argued that Pāniniān Dependency Framework can be considered to deal with the MT errors with special reference to case. Firstly, a corpus of approximately 500 English sentences as input have been provided to Google and Bing online MT platforms. Thereafter, all the output sentences in Urdu have been collated in bulk. Thirdly, all the sentences have been evaluated and errors pertaining to case have been categorized based on the Gold Standard. Finally, Pāniniān dependency framework has been proposed for addressing the case-related errors for Indian languages

    Anaphors in Sanskrit

    Get PDF
    Proceedings of the Second Workshop on Anaphora Resolution (WAR II). Editor: Christer Johansson. NEALT Proceedings Series, Vol. 2 (2008), 11-25. © 2008 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/7129

    Revisiting Low Resource Status of Indian Languages in Machine Translation

    Full text link
    Indian language machine translation performance is hampered due to the lack of large scale multi-lingual sentence aligned corpora and robust benchmarks. Through this paper, we provide and analyse an automated framework to obtain such a corpus for Indian language neural machine translation (NMT) systems. Our pipeline consists of a baseline NMT system, a retrieval module, and an alignment module that is used to work with publicly available websites such as press releases by the government. The main contribution towards this effort is to obtain an incremental method that uses the above pipeline to iteratively improve the size of the corpus as well as improve each of the components of our system. Through our work, we also evaluate the design choices such as the choice of pivoting language and the effect of iterative incremental increase in corpus size. Our work in addition to providing an automated framework also results in generating a relatively larger corpus as compared to existing corpora that are available for Indian languages. This corpus helps us obtain substantially improved results on the publicly available WAT evaluation benchmark and other standard evaluation benchmarks.Comment: 10 pages, few figures, Preprint under revie

    Error Analysis of SaHiT - A Statistical Sanskrit-Hindi Translator

    Get PDF
    AbstractThe paper shows a statistical Sanskrit-Hindi Translator and analyzes the errors being generated by the system. The System is being trained simultaneously on the platform - the Microsoft Translator Hub (MTHub) and is intended only for simple Sanskrit prose texts. The training set includes 24K parallel sentences and 25k monolingual data with recent BLEU (Bilingual Evaluation Understudy) scores of 41 and above. The paper discusses the errors analysis of the system and suggests possible solutions. Further, it also focuses on the evaluation of MTHub system with BLEU metrics. For developing MT systems, the parallel Sanskrit-Hindi text corpora has been collected or developed manually from the literature, health, news and tourism domains. The paper also discusses issues and challenges in the development of translation systems for languages like Sanskrit
    corecore